Model-Free Neural Counterfactual Regret Minimization with Bootstrap Learning

نویسندگان

چکیده

Counterfactual Regret Minimization (CFR) has achieved many fascinating results in solving large-scale Imperfect Information Games (IIGs). Neural network approximation CFR (neural CFR) is one of the promising techniques that can reduce computation and memory consumption by generalizing decision information between similar states. Current neural algorithms have to approximate cumulative regrets. However, efficient accurate a IIG still tough challenge. In this paper, new variant, Recursive (ReCFR), proposed. ReCFR, Substitute Values (RSVs) are learned used replace It proven ReCFR converge Nash equilibrium at rate (1/T). Based on model-free with bootstrap learning, ReCFR-B, Due recursive non-cumulative nature RSVs, ReCFR-B lower-variance training targets than other CFRs. Experimental show competitive state-of-the-art much lower cost.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Introduction to Counterfactual Regret Minimization

In 2000, Hart and Mas-Colell introduced the important game-theoretic algorithm of regret matching. Players reach equilibrium play by tracking regrets for past plays, making future plays proportional to positive regrets. The technique is not only simple and intuitive; it has sparked a revolution in computer game play of some of the most difficult bluffing games, including clear domination of ann...

متن کامل

Counterfactual Regret Minimization for Decentralized Planning

Regret minimization is an effective technique for almost surely producing Nash equilibrium policies in coordination games in the strategic form. Decentralized POMDPs offer a realistic model for sequential coordination problems, but they yield doubly exponential sized games in the strategic form. Recently, counterfactual regret has offered a way to decompose total regret along a (extensive form)...

متن کامل

Counterfactual Regret Minimization in Sequential Security Games

Many real world security problems can be modelled as finite zero-sum games with structured sequential strategies and limited interactions between the players. An abstract class of games unifying these models are the normal-form games with sequential strategies (NFGSS). We show that all games from this class can be modelled as well-formed imperfect-recall extensiveform games and consequently can...

متن کامل

Generalized Sampling and Variance in Counterfactual Regret Minimization

In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR’s sampled v...

متن کامل

Using counterfactual regret minimization to create competitive multiplayer poker agents

Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios with competing agents are stochastic (non-deterministic) with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE transactions on games

سال: 2022

ISSN: ['2475-1502', '2475-1510']

DOI: https://doi.org/10.1109/tg.2022.3158649